CSR Data Collection Pilot
نویسندگان
چکیده
The objective of the CSR Corpus Development is to collect and deliver a large corpus of continuous speech data to support DARPA research efforts in continuous speech recognition (CSR). The CSR corpus is intended to be task independent and to consist of speech that is similar to that which would be expected from eventual users of real world CSR systems. Toward these ends, the current pilot collection effort was designed to minimize controls on vocabulary, background noise, and microphones. CSR is also designed to incorporate spontaneous speech.
منابع مشابه
Spontaneous Speech Collection for the CSR Corpus
As part of a pilot data collection for DARPA's Continuous Speech Recognition (CSR) speech corpus, SRI International experimented with the collection of spontaneous speeoh material. The bulk of the CSR pilot data was read versions of news articles from the Wall Street Journal (WSJ), and the spontaneous sentences were to be similar material, but spontaneously dictated. In the first pilot portion ...
متن کاملNIST-DARPA Interagency Agreement: Spoken Language Program
1. To coordinate the design, development and distribution of speech and natural language corpora for the DARPA Spoken Language research community. 2. To design, coordinate implementation, and analyze results, of performance assessment "benchmark tests" for DARPA's speech recognition and spoken language understanding systems. 1. Completed production of the six-CD-ROM-set for ATIS0, and made this...
متن کاملCollection and Analyses of WSJ-CSR Data at MIT
Recently, the DARPA community started a new data collection initiative in the Wall Street Journal (WSJ) domain to support research and development of very large vocabulary continuous speech recognition (CSR) systems. Since August 1991, our group has actively participated in the development of the WSJ-CSR corpus. The purpose of this paper is to document our involvement in this process, from reco...
متن کاملCSR Data Collection
The CSR Development and Evaluation Spokes data collection task yielded 4435 development test utterances fr~n 30 speakers and 4878 evaluation test utterances from a different set of 30 speakers. The development test data covered eight different spoke conditions, each of which had its own distinct combination of subject, prompt text, microphone and recording environment requirements. Similarly, t...
متن کاملCSR Corpus Development
The CSR (Connected Speech Recognition) corpus represents a new DARPA speech recognition technology development initiative to advance the state of the art in CSR. This corpus essentially supersedes the now old Resource Management (RM) corpus that has fueled DARPA speech recognition technology development for the past 5 years. The new CSR corpus supports research on major new problems including u...
متن کامل